Learning Paraphrase Identification with Structural Alignment
نویسندگان
چکیده
Semantic similarity of text plays an important role in many NLP tasks. It requires using both local information like lexical semantics and structural information like syntactic structures. Recent progress in word representation provides good resources for lexical semantics, and advances in natural language analysis tools make it possible to efficiently generate syntactic and semantic annotations. However, how to combine them to capture the semantics of text is still an open question. Here, we propose a new alignment-based approach to learn semantic similarity. It uses a hybrid representation, attributed relational graphs, to encode lexical, syntactic and semantic information. Alignment of two such graphs combines local and structural information to support similarity estimation. To improve alignment, we introduced structural constraints inspired by a cognitive theory of similarity and analogy. Usually only similarity labels are given in training data and the true alignments are unknown, so we address the learning problem using two approaches: alignment as feature extraction and alignment as latent variable. Our approach is evaluated on the paraphrase identification task and achieved results competitive with the state-of-theart.
منابع مشابه
Semi-Markov Phrase-Based Monolingual Alignment
We introduce a novel discriminative model for phrase-based monolingual alignment using a semi-Markov CRF. Our model achieves stateof-the-art alignment accuracy on two phrasebased alignment datasets (RTE and paraphrase), while doing significantly better than other strong baselines in both non-identical alignment and phrase-only alignment. Additional experiments highlight the potential benefit of...
متن کاملDiscriminative Phrase Embedding for Paraphrase Identification
This work, concerning paraphrase identification task, on one hand contributes to expanding deep learning embeddings to include continuous and discontinuous linguistic phrases. On the other hand, it comes up with a new scheme TF-KLD-KNN to learn the discriminative weights of words and phrases specific to paraphrase task, so that a weighted sum of embeddings can represent sentences more effective...
متن کاملParaphrase Identification by Text Canonicalization
This paper proposes an approach to sentencelevel paraphrase identification by text canonicalization. The source sentence pairs are first converted into surface text that approximates canonical forms. A decision tree learning module which employs simple lexical matching features then takes the output canonicalized texts as its input for a supervised learning process. Experiments on the Microsoft...
متن کاملAMRITA_CEN$@$SemEval-2015: Paraphrase Detection for Twitter using Unsupervised Feature Learning with Recursive Autoencoders
We explore using recursive autoencoders for SemEval 2015 Task 1: Paraphrase and Semantic Similarity in Twitter. Our paraphrase detection system makes use of phrase-structure parse tree embeddings that are then provided as input to a conventional supervised classification model. We achieve an F1 score of 0.45 on paraphrase identification and a Pearson correlation of 0.303 on computing semantic s...
متن کاملLearning Paraphrase Models from Google New Headlines
Data sources like the clusters of news headlines at Google News present an exciting opportunity to learn paraphrase models from data automatically. We present both a novel dataset and a novel approach to automatic, unsupervised learning of paraphrase models from that datset. Leveraging existing NLP tools such as the Stanford Parser and lexical resources such as WordNet and Infomap, we construct...
متن کامل